AITopics | nonconvex stochastic optimization

c203e4a1bdef9372cb9864bfc9b511cc-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 00:38:30 GMT

iteration, optimization, stochastic optimization, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Neural Information Processing SystemsNov-20-2025, 23:16:57 GMT

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

acceleration tradeoff, async-msgd, momentum and asynchrony, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao

Neural Information Processing SystemsNov-20-2025, 21:22:48 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, async-msgd, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.05)
North America > Canada > Quebec > Montreal (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

Neural Information Processing SystemsAug-17-2025, 05:24:26 GMT

Anderson mixing (AM) is an acceleration method for fixed-point iterations.

artificial intelligence, machine learning, optimization, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.84)

Add feedback

Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

Neural Information Processing SystemsJan-19-2025, 00:34:48 GMT

Anderson mixing (AM) is an acceleration method for fixed-point iterations. Despite its success and wide usage in scientific computing, the convergence theory of AM remains unclear, and its applications to machine learning problems are not well explored. In this paper, by introducing damped projection and adaptive regularization to the classical AM, we propose a Stochastic Anderson Mixing (SAM) scheme to solve nonconvex stochastic optimization problems. Under mild assumptions, we establish the convergence theory of SAM, including the almost sure convergence to stationary points and the worst-case iteration complexity. Moreover, the complexity bound can be improved when randomly choosing an iterate as the output. To further accelerate the convergence, we incorporate a variance reduction technique into the proposed SAM.

convergence theory, nonconvex stochastic optimization, stochastic anderson mixing, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)

Add feedback

Reviews: Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Neural Information Processing SystemsOct-8-2024, 10:28:12 GMT

The fundamental claim [line 101 & 239] is that asymptotically, for streaming PCA, the delay tau is allowed to scale as (1 - mu) 2 / sqrt(eta), where mu is the step size and mu the momentum parameter. Major Comments Before we discuss the proof, I think the introduction is somewhat misleading. In line 76, the authors point out previous work all focus on analyzing convergence to a first order optimal solution. The readers can be confused that this paper improved the results of previous work. However, the problems studies in those paper and streaming PCA are different.

acceleration tradeoff, momentum and asynchrony, nonconvex stochastic optimization, (8 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Add feedback

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

Ma, Xuezhe

arXiv.org Machine LearningOct-1-2020

Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive first-order optimization methods with linear complexity for both time and memory. To handle nonconvexity, we replace the Hessian with its rectified absolute value, which is guaranteed to be positive-definite. Nonconvex stochastic optimization is of core practical importance in many fields of machine learning, in particular for training deep neural networks (DNNs). First-order gradient-based optimization algorithms, conceptually attractive due to their linear efficiency on both the time and memory complexity, have led to tremendous progress and impressive successes. However, one disadvantage of SGD is that the gradients in different directions are scaled uniformly, resulting in limited convergence speed and sensitive choice of the learning rate, and thus has spawned a lot of recent interest in accelerating SGD from the algorithmic and practical perspectives. Recently, many adaptive first-order optimization methods have been proposed to achieve rapid training progress with element-wise scaled learning rates, and we can only mention a few here due to space limits. In their pioneering work, Duchi et al. (2011) proposed AdaGrad, which scales the gradient by the square root of the accumulative square gradients from the first iteration.

artificial intelligence, machine learning, pollo, (17 more...)

arXiv.org Machine Learning

2009.13586

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Germany > Berlin (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Liu, Tianyi, Li, Shiyang, Shi, Jianping, Zhou, Enlu, Zhao, Tuo

Neural Information Processing SystemsFeb-14-2020, 13:12:05 GMT

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD).

async-msgd, momentum and asynchrony, nonconvex stochastic optimization, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Liu, Tianyi, Li, Shiyang, Shi, Jianping, Zhou, Enlu, Zhao, Tuo

Neural Information Processing SystemsDec-31-2018

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

artificial intelligence, async-msgd, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Filters

Collaborating Authors

nonconvex stochastic optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

c203e4a1bdef9372cb9864bfc9b511cc-Paper.pdf

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

Reviews: Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization